Skip to content

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

The default topology contains a single rack. With rack-aware container placement policy (HDDS-8300), overreplication is considered a misreplication, since more replicas are in a single rack than desired. Yet misreplication cannot be resolved since there is no other rack.

This can be reproduced by configuring rack-awareness for integration tests:

diff --git hadoop-ozone/integration-test/src/test/resources/ozone-site.xml hadoop-ozone/integration-test/src/test/resources/ozone-site.xml
index 0c5ae1fa88..87a85dc1f1 100644
--- hadoop-ozone/integration-test/src/test/resources/ozone-site.xml
+++ hadoop-ozone/integration-test/src/test/resources/ozone-site.xml
@@ -31,4 +31,9 @@
     <value>4</value>
   </property>
 
+  <property>
+    <name>ozone.scm.container.placement.impl</name>
+    <value>org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware</value>
+  </property>
+
 </configuration>

and running:

$ mvn -am -pl :ozone-integration-test -Dtest='TestDecommissionAndMaintenance#testContainerIsReplicatedWhenAllNodesGotoMaintenance' clean test
...
	at org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance.waitForContainerReplicas(TestDecommissionAndMaintenance.java:757)
	at org.apache.hadoop.ozone.scm.node.TestDecommissionAndMaintenance.testContainerIsReplicatedWhenAllNodesGotoMaintenance(TestDecommissionAndMaintenance.java:431)
...
[ERROR] Errors: 
[ERROR]   TestDecommissionAndMaintenance.testContainerIsReplicatedWhenAllNodesGotoMaintenance:431->waitForContainerReplicas:757 » Timeout

Container placement should be considered valid (not misreplicated) if there is only a single rack. This will let the overreplication logic take care of the extra replicas instead of the misreplication one.

https://issues.apache.org/jira/browse/HDDS-8383

How was this patch tested?

The same integration test passed. Also ran unit tests related to topology and placement policy.

https://github.com/adoroszlai/hadoop-ozone/actions/runs/4617349558

@siddhantsangwan
Copy link
Contributor

Going through this code in SCMCommonPlacementPolicy:

  public ContainerPlacementStatus validateContainerPlacement(
      List<DatanodeDetails> dns, int replicas) {
    NetworkTopology topology = nodeManager.getClusterNetworkTopologyMap();
    // We have a network topology so calculate if it is satisfied or not.
    int requiredRacks = getRequiredRackCount(replicas);
    int numRacks = topology != null ? topology.getRackCount() : 1;
    if (numRacks == 1 || replicas == 1 || requiredRacks == 1) {
      if (dns.size() > 0) {
        // placement is always satisfied if there is at least one DN.
        return validPlacement;
      } else {
        return invalidPlacement;
      }
    }

I'm wondering if getRequiredRackCount(replicas) should return 1 instead of 2 in this case? Currently in SCMContainerPlacementRackAware the constant value REQUIRED_RACKS, which is 2, is returned.

@adoroszlai adoroszlai requested a review from sodonnel April 10, 2023 07:21
@adoroszlai
Copy link
Contributor Author

adoroszlai commented Apr 10, 2023

Thanks @siddhantsangwan for taking a look.

I'm wondering if getRequiredRackCount(replicas) should return 1 instead of 2 in this case?

Good point, it might be a more complete fix, and SCMContainerPlacementRackScatter already does that. Also, there are a few places using min(numRacks, requiredRacks), so we might as well get rid of those.

@kerneltime
Copy link
Contributor

@swamirishi @ashishkumar50 can you please take a look?

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than a minor comment.

Comment on lines 80 to 82
stat = new ContainerPlacementStatusDefault(1, 4, 3, 1, Arrays.asList(1, 2));
stat = new ContainerPlacementStatusDefault(1, 4, 1, Arrays.asList(1, 2));
assertFalse(stat.isPolicySatisfied());
assertEquals(2, stat.misReplicationCount());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code here doesn't make much sense to me because it's saying currentRacks is 1 but the last argument says 1 replica is on 1 rack and 2 replicas on another rack.

Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes a bunch of interfaces changing their definition of the class, would be better to discuss this.

List<? extends Node> sortByDistanceCost(Node reader,
List<? extends Node> nodes, int activeLen);

default int getRackCount() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have the concept of Racks in Network Topology? Should this particular function go in PlacementPolicy class instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sodonnel I remember us discussing this before.

}
int maxLevel = networkTopology.getMaxLevel();
int numRacks = networkTopology.getNumOfNodes(maxLevel - 1);
int numRacks = networkTopology.getRackCount();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can move this particular logic into SCM common placement policy instead of relying on NetworkTopology since NetworkTopology class is meant to be more generic & need not understand racks.

protected int getRequiredRackCount(int numReplicas) {
return REQUIRED_RACKS;
int racks = networkTopology != null ? networkTopology.getRackCount() : 1;
return Math.min(REQUIRED_RACKS, racks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, this is the only change which fixes the particular issue. I think it would be better to create another refactoring jira, if you want to change the other interfaces.

@adoroszlai adoroszlai marked this pull request as draft April 11, 2023 06:54
@adoroszlai adoroszlai marked this pull request as ready for review April 11, 2023 07:44
Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adoroszlai
Copy link
Contributor Author

@swamirishi would you like to take another look?

Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adoroszlai adoroszlai merged commit 45e8c5c into apache:master Apr 11, 2023
@adoroszlai adoroszlai deleted the HDDS-8383 branch April 11, 2023 16:38
@adoroszlai
Copy link
Contributor Author

Thanks @siddhantsangwan, @sodonnel, @swamirishi for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working scm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants